To analyze the ICU Admissions dataset to develop a predictive model for Vital Status.
2019-04-08
To analyze the ICU Admissions dataset to develop a predictive model for Vital Status.
Author(s): Stanley Lemeshow, Daniel Teres, Jill Spitz Avrunin and Harris Pastides Source: Journal of the American Statistical Association, Vol. 83, No. 402 (Jun., 1988), pp. 348- 356
The Shapiro-Wilks Test is a test of normality.
Ho= Normally distributed
Ha= Not normally distributed
Reject the null hypothesis and conclude that systolic bp is not normally distributed.
## ## Shapiro-Wilk normality test ## ## data: Systolic ## W = 0.98369, p-value = 0.0204
## [1] 200 179
Reject the null hypothesis and conclude that age is not normally distributed.
## ## Shapiro-Wilk normality test ## ## data: Age ## W = 0.92836, p-value = 2.507e-08
## [1] 23 97
Reject the null hypothesis and conclude that heart rate is not normally distributed.
## ## Shapiro-Wilk normality test ## ## data: HeartRate ## W = 0.98598, p-value = 0.04478
## [1] 125 48
None of the continuous variables were normally distributed.
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ.
Ho: the distributions are the same
Ha: the distributions are not the same
## No Yes ## 63 60
## ## Wilcoxon rank sum test with continuity correction ## ## data: Age by CPR ## W = 1273, p-value = 0.7775 ## alternative hypothesis: true location shift is not equal to 0
Null hypothesis cannot be rejected and we therefore concluded that the distribution of Age is the same for those that had and did not have CPR.
## Died Lived ## 68 61
## ## Wilcoxon rank sum test with continuity correction ## ## data: Age by Status ## W = 4031.5, p-value = 0.01112 ## alternative hypothesis: true location shift is not equal to 0
Null hypothesis can be rejected and we therefore concluded that the distribution of Age and those that lived and died is not the same.
## No Yes ## 63 62
## ## Wilcoxon rank sum test with continuity correction ## ## data: Age by Cancer ## W = 1937, p-value = 0.5782 ## alternative hypothesis: true location shift is not equal to 0
Null hypothesis cannot be rejected and we therefore concluded that the distribution of Age and those that had and did not have Cancer is the same.
## No Yes ## 88.5 106.0
## ## Wilcoxon rank sum test with continuity correction ## ## data: HeartRate by Infection ## W = 3056, p-value = 6.952e-06 ## alternative hypothesis: true location shift is not equal to 0
Null hypothesis can be rejected and we therefore can concluded that the distribution of Heartrate and those that had an infection vs did not have in infection is not the same.
In conclusion, the distributions for Heart Rate for those with an infection or not as well as Age for those who lived and died are not the same. The distribution in age in those who had cancer or did not, and age for those who did and did not have CPR were the same.
Chi-Square tests the probability of independence of categorical variables.
Ho: No association between the two variables
Ha: Association
| Sex | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| Female |
16 40Â % |
60 37.5Â % |
76 38Â % |
| Male |
24 60Â % |
100 62.5Â % |
124 62Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=0.012 · df=1 · φ=0.021 · p=0.913 |
| Race | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| Black |
1 2.5Â % |
14 8.8Â % |
15 7.5Â % |
| Other |
2 5Â % |
8 5Â % |
10 5Â % |
| White |
37 92.5Â % |
138 86.2Â % |
175 87.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=1.810 · df=2 · Cramer's V=0.095 · Fisher's p=0.494 |
| Service | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| Medical |
26 65Â % |
67 41.9Â % |
93 46.5Â % |
| Surgical |
14 35Â % |
93 58.1Â % |
107 53.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=5.981 · df=1 · φ=0.185 · p=0.014 |
| Cancer | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
36 90Â % |
144 90Â % |
180 90Â % |
| Yes |
4 10Â % |
16 10Â % |
20 10Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=0.000 · df=1 · φ=0.000 · Fisher's p=1.000 |
| Renal | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
32 80Â % |
149 93.1Â % |
181 90.5Â % |
| Yes |
8 20Â % |
11 6.9Â % |
19 9.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=4.976 · df=1 · φ=0.179 · Fisher's p=0.029 |
| Infection | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
16 40Â % |
100 62.5Â % |
116 58Â % |
| Yes |
24 60Â % |
60 37.5Â % |
84 42Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=5.759 · df=1 · φ=0.182 · p=0.016 |
| CPR | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
33 82.5Â % |
154 96.2Â % |
187 93.5Â % |
| Yes |
7 17.5Â % |
6 3.8Â % |
13 6.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=7.821 · df=1 · φ=0.223 · Fisher's p=0.005 |
| Previous | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
33 82.5Â % |
137 85.6Â % |
170 85Â % |
| Yes |
7 17.5Â % |
23 14.4Â % |
30 15Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=0.061 · df=1 · φ=0.035 · Fisher's p=0.624 |
| Type | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| Elective |
2 5Â % |
51 31.9Â % |
53 26.5Â % |
| Emergency |
38 95Â % |
109 68.1Â % |
147 73.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=10.527 · df=1 · φ=0.244 · p=0.001 |
| Fracture | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
37 92.5Â % |
148 92.5Â % |
185 92.5Â % |
| Yes |
3 7.5Â % |
12 7.5Â % |
15 7.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=0.000 · df=1 · φ=0.000 · Fisher's p=1.000 |
| PO2 | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
35 87.5Â % |
149 93.1Â % |
184 92Â % |
| Yes |
5 12.5Â % |
11 6.9Â % |
16 8Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=0.718 · df=1 · φ=0.083 · Fisher's p=0.324 |
| PH | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
36 90Â % |
151 94.4Â % |
187 93.5Â % |
| Yes |
4 10Â % |
9 5.6Â % |
13 6.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=0.416 · df=1 · φ=0.071 · Fisher's p=0.297 |
| PCO2 | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
36 90Â % |
144 90Â % |
180 90Â % |
| Yes |
4 10Â % |
16 10Â % |
20 10Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=0.000 · df=1 · φ=0.000 · Fisher's p=1.000 |
| Bicarbonate | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
35 87.5Â % |
150 93.8Â % |
185 92.5Â % |
| Yes |
5 12.5Â % |
10 6.2Â % |
15 7.5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=1.014 · df=1 · φ=0.095 · Fisher's p=0.187 |
| Creatinine | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| No |
35 87.5Â % |
155 96.9Â % |
190 95Â % |
| Yes |
5 12.5Â % |
5 3.1Â % |
10 5Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=4.112 · df=1 · φ=0.172 · Fisher's p=0.029 |
| Consciousness | Status | Total | |
|---|---|---|---|
| Died | Lived | ||
| Conscious |
8 20Â % |
2 1.2Â % |
10 5Â % |
| Unconscious |
32 80Â % |
158 98.8Â % |
190 95Â % |
| Total |
40 100Â % |
160 100Â % |
200 100Â % |
χ2=19.901 · df=1 · φ=0.344 · Fisher's p=0.000 |
In conclusion, service, renal, infection, CPR, type, creatinine and conciousness each have an association to Status.
Correlational analyses are used to look at the relationships between two variables to determine if the two variables are related to each other.
Since age, heart rate and systolic blood pressure are all not normally distributed, the spearman's rank correlation coefficient was used to test correlation between these variables.
From the prior analysis, we can conclude that the variables, systolic, heart rate and age are all not correlated to each other and therefore cannot influence each other when used as predictor variables. Two categorical variables Type and Service are correlated with a value of -0.54, they could influence eachother if used as predictor variables.
A logistic regression model is developed in this section to find which variables in the dataset predict status.
Ho: None of the independent variables in the data set predict hospital mortality of ICU patients, based on information available at the time of ICU admissions.
Ha: Some of the independent variables in the data set do predict hospital mortality of ICU patients, based on information available at the time of ICU admissions.
Therefore, we can state the predictor variables of CancerYes, Age, TypeEmergency, and ConsciousnessConscious have a statistically significant relation with Vital Status.
Based on the odds ratios, Level of Consciousness at admission (no coma) is the greatest predictor (18.95) of Vital Status.
In statistics, the Bayesian information criterion (BIC) for model selection among a finite set of models. It is based, in part, on the likelihood function, and it is closely related to Akaike information criterion (AIC).
We then verified our findings by using a Step-Wise Forward-Backward Bayesian Regression Model.
The BIC model found the following variables to be statistically significant the independent variables identified as significant
1. ConciousnessUnconscious
2. TypeEmergency
3. CancerYes
4. Age
We achieved a lower AIC score of 149.1
Splitting the dataset into training sample (70%) and testing sample (30%).
The training sample size is 140 and the testing sample size is 60.
The intent of our training model is to predict the mortality odds of a patient being diagnosed with a Vital Status = 1 (died) based on the informiation available at the time of ICU admission, specifically using the best predictor variables of CancerYes, Age, TypeEmergency, ConsciousnessConscious, & Systolic.
| Â | Status | ||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 7.99 | 0.05 – 1269.19 | 0.422 |
| Yes | 0.07 | 0.01 – 0.84 | 0.037 |
| Age | 0.96 | 0.94 – 0.99 | 0.010 |
| Emergency | 0.02 | 0.00 – 0.47 | 0.014 |
| Unconscious | 21.24 | 2.27 – 199.17 | 0.007 |
| Systolic | 1.02 | 1.00 – 1.04 | 0.057 |
| Observations | 140 | ||
| Cox & Snell's R2 / Nagelkerke's R2 | 0.274 / 0.424 | ||
For our Training Model, the following variables as statistically significant at an alpha level of 0.05 in regard to predicting hospital mortality of ICU patients. The variables are listed in order of greatest signifiance to least based on p-values:
1. Consciousness(Unconscious): smallest p-value of 0.007 3. Age: 0.010 3. Systolic: 0.01817 4. Type(Emergency): 0.014 5. Cancer(Yes): 0.037
Null Deviance > Residual Deviance?
YES, decreases 55 points, indicating a good model.
AIC Value = 101.18
The lowest AIC so far, a lower AIC value indicated an improvement in the model
Confidence Intervals:
None of the confidence intervals include 1, which indicates the variables are statistically significant.
Our training model is an improvement upon our BIC Logistic Regression Model as the significance of each variable based upon p-values aligns with the predictability of the odds ratios. In addition, our AIC value is the lowest at a value of 101.18, with the inclusion of Systolic.
Ho: The predicted values in the training model cannot be used to predict Vital Status and/or hospital mortality of ICU patients, based on information available at the time of ICU admissions; the predictions of the testing model are not statistically significant.
Ha: The predicted values in the training model can be used to predict Vital Status and/or hospital mortality of ICU patients, based on information available at the time of ICU admissions; the predictions of the testing model are not statistically significant.
The relevant null hypothesis is Ho: the predicted values of Status and the actual values are not correlated.
## ## Pearson's product-moment correlation ## ## data: x$actual and x$predicted ## t = 2.1478, df = 58, p-value = 0.03592 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.01880832 0.49148596 ## sample estimates: ## cor ## 0.2714367
Model to predict vital status included (in order of high to low statistical significance:
Accuracy of training model to predict mortality = 15.9%
Based on our analysis, we DO NOT reject the null hypothesis.